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[57] ABSTRACT 

A system (10) for analyzing a data file containing a plurality 
of data records with each data record containing a plurality 
of parameters is provided. The system (10) includes an input 
(40) for receiving the data file and a data processor (32) 
having at least one of several data processing functions. 
These data processing functions include, for example, a 
segmentation function (34) for segrnenting the data records 
into a plurality of segments based on the parameters. The 
data processing functions also include a clustering function 
(36) for clustering the data records into a plurality of clusters 
containing data records having similar parameters. A pre- 
diction function (38) for predicting expected future results 
from the parameters in the data records may also be provided 
with the data processor (32). 
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DATA ANALYSIS SYSTEM AND METHOD Yet another aspect of the present invention provides a 

method for analyzing a data file containing a plurality of data 

TECHNICAL FIELD OF THE INVENTION records, each data record containing a plurality of param- 

. , . , i-j etc rs. The method further includes the Steps of inputting the 

This invention relates in general to the field of data ^ data file and processing the data file. Processing the data file 

analysis, and more particularly to a statistical analysis sys- jj,^,^^^^^ ^^ segmenting the data records into a 

tern and method for analyzmg data. pluraUty of segments based on the parameters, clustering the 

BACKGROUND OF THE INVENTION ^^^^ ?, P'"""'y °^ containing data 

records having similar paraoieters, and predicting expected 

In recent years advancements in technology have reduced lO future results from the parameters in the data records. 

the cost of computers to the point where nearly every event The present invention provides several technical advan- 

in one's day is recorded by a computer. Events recorded by tages. One technical advantage of the present invention is 

computer are numerous and include, for example, every that it provides a user-firiendly computer system and method 

transaction made by an individual. Computers store the data for performing statistical analysis on the information within 

associated with the transactions they process and this results 15 a database. 

in sometimes large database(s) of information. Another technical advantage of the present invention is 

The problem, therefore, arises of how to ma ke eflScient that it provides several statistical analysis tools within a 

use of the tremendous amount of information in these single computer system. Each tool may be used to perform 

jitabase (s). When the number of records in a database rises statistical analysis on the information within a database, 

to a sufficiently large level, simply sorting the information in Additionally, the results of the analysis from several tools 

the database provides no meaningful results. While statisti- may be combined for enhanced statistical data analysis, 

cal analysis of the records in a database may yield useful Yet another technical advantage of the present invention 

information, such analysis generally requires that persons that it may be used to identify complex patterns and 

with advanced training in math or computer science perform relationships within large quantities of information. By 

the analysis and understand the results of the analysis. ^ defining these patterns and relationships in, for example. 

Additionally, translation of the statistical analysis of the customer information, targeted marketing or promotion 

information in a large database into a form that may be activities may be developed. 

useful for such activities as marketing is also difficult. Such ^ additional technical advantage of the present invention 

a simation may prevent the effective use of the information ^ (^at it may be used in developing a marketing program for 

in a database and preclude the use of a possible valuable identifying customers that are most likely to respond to the 

resource. marketing program. Moreover, it may be used to profile 

„ „ customer groups to identify socio -demographic or behav- 

SUMMARY OF THE INVENTION -^^^j charfcteLcs within the customer groups. It also 

In accordance with the present invention, a data analysis 35 provides for identifying significant associations between 

system and method are provided that substantially eliminate customer behavior, lifestyle, or attitudinal features, and may 

or reduce disadvantages and proble ras-a ^ciatcd with pre- be used to identify significant associations between cus- 

viously developed data analysisloo lsr tomer purchase preferences. 

One aspect of the present invention provides a system for Another technical advantage of the present invention is 

^alvzing a data file containing a plurality of data records 40 provides for segmenting records into logical groups, 

with each data record containihg a plurality ot ■pi^^amWre^ Yet another technical advantage of the present invention 
lEe system includes an input tor receivmg Uic dm llle and- is that it provides for clustering records into statistically 

a data processor having at least one of several data process- significant groups. 

ing functions. These data processing functions include, for Yet another technical advantage of the present invention 

example, a segmentation function for segmenting the data 45 is that it may be used to predict customer or potential 

records into a plurality of segments based on the parameters. customer behavior, including, for example, propensity to 

The data processing functions also include a clustering respond to direct mail or telemarketing, product preference, 

function for clustering the data records into a plurality of profitability, credit risk, and probability of attrition. The 

clusters containing data records having similar parameters. present invention also provides a technical advantage of 

The clustering function can also generate cluster maps 50 identifying "unusual" customers and potentially fraudulent 

depicting the number of records in each cluster. A prediction behavior by those customers. 

function for predicting expected future results from the BRIEF DESCRIPTION OF THE DRAWINGS 
parameters in the data records may also be provided with the 

data processing function. For ^ more complete understanding of the present mven- 

. , . • *• -J * „ lion and advantages thereof, references is now made to the 

Another aspect of the present invention provides a system 55 „ „ . , . *. . . ... 

c 1 • J * CI * • • i,,,«r«,. «f following description taken m conjunction with the accom- 

for analyzing a data file contaimng a plurality or customer . ^, . *^ . i , . 

J J »• 11% f ^ panymg drawings in which like reference numbers mdicate 

data records, each data record contains a plurality of cus- f.. ^ ^ , , 

r™_ , - 1 J • if ' like icaturcs and wherein: 

tomer parameters. The system includes an mput for recciv- , . . 

ing the data file and a data processor for processing the data FI^. 1 shows an exemplary system for data analysis m 

records^'ThTTata processor includes a segmentation func- 60 accordance with concepts of the present invention; 

tion for segmenting the customer data records into a plural- FIG- 2 is an exemplary data mput window for use with the 

ity of segments based on the parameters. The data processor present invention; 

also includes a clustering function for clustering the cus- FIG. 3 is an exemplary flowchart for rule based segmen- 

tomer data records into a plurality of customer groups lation in accordance with the present invention; 

having similar parameters. A prediction function for pre- 65 FIG. 4 illustrates a rule based segmentation window in 

dieting customer behavior from the customer data records is accordance with one aspect of the present data analysis 

also provided with the data processor. invention; 
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FIG. 5 shows a rule based segmentation setup window for FIG. 28 illustrates an exemplary post -network graphical 

use the presciit iiivealion; analysis window with the neural prediction function of the 

FIG. 6 illustrates an exemplary window for editing a bio present system and method; 
within the rule based segmentation function of the present pjQ 29 Ulustrates a data merge operation in accordance 
data analysis invenUon; 5 with the present invention; 

FIG. 7 depicts an exemplary of a parameter distribution 3^ ^^^^^^ exemplary data merge window for use 

wmdow mcludmg histograms available with the rule based ^ ^^^^ accordance with the present invention; 

segmentation function m accordance with the present mvcn- , , . . 

jJqjj. FIG. 31 illustrates a data append operation in accordance 

FIG. 8 is an exemplary flowchart for a neural clustering 10 present invention, 

function available with the present invention; ^^P^^ts an exemplary data append window for use 

FIGS. 9A and 9B illustrate a clustering process in accor- "^'^^ ^ ^^^^^ W^nd in accordance with the present invention; 
dance with the present system; FIG. 33 shows an exemplary data paring window for use 

FIG, 10 illustrates an exemplary neural clustering window ^ith a data paring in accordance with the present invention; 
for use with the neural clustering function available with the 

present data analysis system; FIG. 34 illustrates an exemplary data output window for 

FIG. 11 shows an exemplary dialog window for setting up outputting data in accordance with the present invention, 
a neural clustering run in accordance with the present data 

analysis invcntionl DETAILED DESCWPTO qF THE 

. . . 20 INVENTION 

FIG. 12 depicts an exemplary parameter selection win- 
dow for use with the neural clustering function available Preferred embodiments of the present invention are illus- 

with the present invention; trated in the FIGUREs, like numerals being used to refer to 

FIG. 13 illustrates an exemplary clustering analysis win- like and corresponding parts of the various drawings, 
dow in accordance with the present data analysis invention; FIG. 1 shows data analysis system 10 embodying con- 

FIG. 14 depicts an exemplary graph options dialog box cepts of the present invention. Data analysis system 10 

available with the neural clustering function of the present preferably includes processor 12, random access memory 

data analysis invention; (RAM) 14, read only memory (ROM) 16, pointing device 

FIG. 15 illustrates an exemplary parameter distribution 18, keyboard 20, and various output device(s). The output 
window with histograms available with the present inven- 30 device(s) for system 10 may include, for example, external 

tion; memory devices such as tape drive 22 and disk drive(s) 24, 

HGS. 16A and 16B illustrate a multi-layer perception pnnter 26, and display 28. Data analysis system 10 also 

network and neuron, respectively, used in one embodiment preferably includes modem 30 for making connections to 

of the neural prediction function of the present invention; external communication mediums. Data analysis system 10 

FIG. 17 illustrates an exemplary flowchart for the neural 35 ^ "ot limited to any particular hardware embodiment and 

prediction funcUon in accordance with the present invention; ^^^V implemented in one or more computer systems. 

HG. 18 shows an exemplary neural network file specifi- ^[occssor 12 in system 10 is adapted to execute many types 
cation window for use with the neural prediction function of computer instructions m many <x>mputer languages for 

the present invention* implementmg the functions available data analysis system 

FIG. 19 depicts an exemplary input file selection dialog 
box for use with the neural prediction function of the present R^^.^ ^^^ly^^s ^>^^^»^ }^ 1° 1 provides an advanced 

invention* statistical analysis tool for analyzing databases containing 

HG. 20 shows an exemplary specify parameters window °"«y f^'^'^'"' types of data. Although system 10 may be 

for use with the neural prediction Men available with the "^'^ ""^y^/'' "L^f^K'"^ '=°"'"°"!f ^ ^f^^'y °] 
present data analysis invention; « mformation, system, 10 has been successfully implemented 

T-r^ it \ ^ . and has been found to be particularly useful in analyzing 

FIG. 21 Illustrates an exemplary encode mput parameter ^^^^^^ databases. Data analysis system 10 may provide 

e^nt hivlntio^ prediction function of the ,ignific,„, 5e„,fi^ ^p^biiity to identify complex 

presen m^en ion, patterns and relationships within large quantities of infor- 

HG. 22 depicts an exemplary run neural network wmdow nation. To that end, system 10 includes several functions, 

for use with the neural prediction ftmction of the present gy^j^j^ preferably includes data processor 32 that is 

mvenlion; supported by processor 12. Within data processor 32 are 

FIG. 23 shows an exemplary edit network configuration preferably mle based segmentation function 34, neural clus- 

window for use with the neural prediction function of the lenng funcUon 36, and neural prediction function 38. Data 

present mvenUon; processor 32 uses data acquisition and output function 40 

FIG. 24 shows an exemplary neural prediction edit run and data management function 42 to receive and manipulate 

window for use with the neural prediction window of the data in performing data analysis. Such data is typically 

present data analysis invention; found in one or more database(s) 44 that may be stored on 

FIG. 25 illustrates an exemplary text neural network tape drive 22 or disk drive(s) 24. 
results window in accordance with the neural prediction Data acquisition and output funcUon 40 is responsible for 

function of the present invention; receiving data from database(s) 44 and formatting the data 

FIG. 26 shows an exemplary graphical neural network for processing by data processor 32. In one embodiment of 

results window for the neural prediction function of the the present invention, data acquisition and output function 

present invention; 40 receives customer data in a fiat ASCII format from 

FIG. 27 depicts an exemplary dialog box for defining a 65 database(s) 44 and converts it into a concise internal binary 

graph *s characteristics generated with the neural prediction form for use by data processor 32. Data acquisition and 

function of the present invention; output function 40 preferably includes a data dictionary 
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function that allows for setting up and customizing param- is particularly beneficial in analyzing customer databases 

ctcr names for all of the parameters in a given database and that include information on the types of products purchased, 

will be described in discussions relating to FIG. 2. frequency of purchase, quantity of purchase, and other 

Data management function 42 of data analysis system 10 general information on customers, e.g., age, gender, marital 
allows for concatenating different sets of records, either 5 status, etc An example of such a database ^ the dcmo- 

diflferent sets of records with the same data fields or diflferent f'^^^'^i ^nd lifestyle database available from NDL nterna- 

sets of fields for the same records. Data management func- ^^^^ % further describmg data analysis system 10, 

, . .jj. J- J. . reference will be made to customer databases, but these 

tion 42 allows for prumng out redundant records in a data set ^ ^ ^ the present 

and converting binary records into an appropnate format for ^ata analysis system may be used for analyzing many 
use by other systems. diflferent databases of information without departing from 
Data processor 32 preferably includes several functions the spirit and scope of the present invention. One embodi- 
for performing data analysis. Rule based segmentation func- ment of data analysis system 10 is available firom Electronic 
tion 34 within data processor 32 preferably provides a Data Systems Corporation under the trade name AcuSTAR. 
mixture of query, segmentation, and statistical analysis As a first step in using data analysis system 10 in FIG. 1, 
capabilities. Rule based segmentation function 34 may pro- a input data set or file may be retrieved from database(s) 44 
vide a flexible and powerful facility for the investigation of stored in tape drive 22 or disk drive(s) 24. As previously 
data. Rule based segmentation function 34 may provide stated, data acquisition and output function 40 of system 10 
statistical information on each parameter in a data set for all provides the necessary data input capability to convert raw 
records in the data set or for a given record segment. The data in database(s) 44 into a form that can be used by data 
segmentation tool also allows for splitting data into a set of processor 32. In one embodiment of data analysis system 10, 
hierarchically organized logical groups or tree structures. data in database(s) 44 is in ASCII data flat file format. This 
The segmentation process may be controlled by simple rules means that the data is provided in the form: 
specified in "English-Like" tests for each branch in the 
hierarchy. The segmentation logic in rule based segmenta- 
tion function 34 is easy to understand and modify and can be 

interactively modified for further pruning or further seg- R^ord 2 o b c d 

menting of the records in a data set to create a structure of Record 3 a b c d ] ! ! 



any degree of complexity. This segmentation capability, 

combined with a statistical analysis capabflity, provides an . . , - ^ , . j - . ■ n r 

efficient, flexible, and interactive system and method for '° ^J"^ '^"^ -""y ''^ P'^^'"*'' 'yP'""^ °f 

analyzing and partitioning large quantities of data. ^»y^- f'ngle-spaced separated, comma-separated, or 

..I.. J . • , ,n , tab-separated format, 

uata processor JZ witnm data analysis system lu also inputting the data in databasefs) 44, data acquisition 

preferably includes neural clustering function 36. Neural ^^^^.^^ ^^^^^^ . ^^^^ ^^^^ ^^^^ 

clustering fiinction 36 clusters records m^^ 3^ appropriate binary format. In one embodiment of 

nificant groups of simdar records. Tliis fiinction can identify ^^^^^ ^ ^^^^^ ^^^^^^ ^ ^^^^ 

the characteristic profiles of the groups and ranks the defin- parameters in a binary data file. Data files input by data 

ing parametei^ m accordance with significance and differ- ^ ^^i^ition and output function 40 may be used by data 

ences from the population average. This capabihty is a ^^^^^^^ 33 ^^^^ ^^^j ^ 

powerfiil and compiitaUonally efficient form of statistical ^ fllustrates an exemplary data input window 45 that 

clustering and provides a method of discovering discnmi- .^^^ ^ ^-^^^ • ^^^^^^ 

natoiy patterns and associations withm large qtiantities of . ^ ^^^^ database(s) 44 into a format that may be 

unexplored mfonnation. Previously unlmown relationships ^ ^^^^ ^^^j ^ Processor 12 generates 

in the data may be uncovered m data and expected relauon- ^^^^^ ^^^^ ^^^^^ ^^^^^^ described 

ships verified quickly and easily with neural clustenng hereinafter, on display 28 using standard graphical user 

function 36 of the present invention. ^^^^^^^ ^^^1) protocols and techniques. 

Neural prediction function 38 within data processor 32 piQ 2 also illustrates main toolbar 46 that is generaUy 

provides the capability to predict future behavior or rela- available with the windows provided by data analysis sys- 

tionships of the items represented by a particular data set. Toolbar 46 provides access to the functions available 
Using a statistical machine learning technology that has the 50 within data analysis system 10 of the present invention. The 

ability to learn from historical behavior stored in a database, ^^^ttons in main toolbar 46, as well as aU other buttons 

neural prediction function 38 can be used to predict many described hereinafter, may be selected and activated using 

aspects for behavior for which records or historical behavior standard "select and click*' techniques with pointing device 

are held in a data set and display 28. The number and design of the buttons 

While data processor 32 shown in FIG. 1 includes rule 55 shown for toolbar 46 in FIG. 2 are exemplary of the buttons 

based segmentation function 34, neural clustering function that may be included with toolbar 46. The numbered design 

36, and neural prediction function 38 the present data of the buttons may be modified as necessary without depart- 

analysis system and method is not limited to these functions. fng from the spirit and scope of the present invention. 

Some embodiments of the present invention may include Main toolbar 46 includes data input button 47 that may be 
only one of these functions while other embodiments may go selected to access the data input capability of data acquisi- 

include additional functions not shown in FIG. 2. The tion and output function 40. Rule base segmentation button 

number and type of data processing functions in system 10 48 may be selected to access rule based segmentation 

may therefore be varied without departing from the spirit function 34. Clustering buttons 49 may be selected to access 

and scope of the present invention. neural clustering function 36. Prediction buttons 50 may be 

Data analysis system 10 may be used to analyze 65 selected to access neural prediction function 38. Data merge 

database(s) of information of many types and is not limited button 51 may be selected to access a data merge capability 

to any particular database content. Data analysis system 10 within data management function 42. Data append button 52 
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provides access to a data append function within data FIG. 3 shows an exemplary flowchart for rule hased 
managemeni luuciion 42. Data paring button 53 may be segmentation function 34 of data analysis system 10. As 
selected to access a data paring capability within data previously noted, rule based segmentation function 34 
management function 42. Data output button 54 may be allows for applying flexible rule based segmentation tech- 
selected to output a data file via data acquisition and output 5 niques to organize data hierarchicaUy. When operating on 
function 40 once the results of the processing of a data file customer databases, rule based segmentation function 34 
with data processor 32 are complete. Also, within main provides for market segmentation and customer scoring. By 
toolbar 46 is exit button 55, which may be selected at any - ^ ^^^^^ -^^^ ^^^^ ^„ ^l^.^j^ 

time o exit data analys^ system 10- ^ ^ ^ ^ . ^ nested upon each other, the data may be split into different 

Data input window 45 in FIG. 2 also mcludcs data input „ * i- *• *i_ j * j 

♦ iu «/ T lu • 1 J • •.' . • . c 50 segments. Statistics on the data, e.g., occupancy, mean, and 

toolbar 56. Toolbar 56 includes initiate new input coniieu- r . . . 1 r , \ 

ration button 57, open existing input data configuration ^mimum and maximum values for each segment may then 

button 58, save data input configuration 59, save data exammed, histograms may be plotted, and these results 

configuration as button 60, run data input function button 61, analyzed. 

stop data input button 62, and close data input function One embodiment of rule based segmentation funcUon 34 

button 63. The buttons in toolbar 56 provide a simple ^5 in accordance with the present mvenhon has three main 

method for entering standard commands when inputting a functions. The first function provides for subdivision of 

data file. database records, e.g., customer population into a hierarchy 

"* Data input window 45 in FIG. 2 preferably includes input of logical segments. The second function provides for iden- 

£ ]£_ section 6 4. Input file section 64 includes input filename" tification of high level statistics for each segment, and the 

field 6 4a and browse button 64fc that may be selected to view 20 third provides for identifying detailed statistical distribu- 

a list of available input files. Input file section 64 also tions for each segment. 

preferably includes start line field 64c that specifies the line in FIG. 3 rule based segmentation function 34 begins at 

in the input file where data begins. DeHmiter field 64d step 68 whenever, for example, rule based segmentation 

identifies the type of delimiter used in the input file, and button 48 on main toolbar 46 is selected. At step 69 the data 

parameters field 64e indicates the number of parameters in to be analyzed with rule based segmentation function 34 is 

the input file. By selecting configure button 64/ in input file appropriately formatted as described in discussions relating 

section 64, data acquisition and output ftinction 40 will 2. Proceeding to step 70 an initial analysis of the data 

attempt to read the format of the selected data file to accomplished that provides basic statistics on the 

complete the information in start hne held o4c, dehmiter . , v. 

£. aI^aj J . £ tj o 1 - data file, including, tor example, a mean. Standard deviation, 

field 64iJ, and parameters field 64e. Selecting view button j • • T • 1 r l£ u • j * 

LA ' • ^ a\ *• CA A' \ ' *• ^Tff 30 and minimum and maximum values tor each field m the data 

64g in input file section 64 displays in message section 65 _ 

of window 45 the first several records in the input data file. \' • . « 1 - r - 
An example of the type of data to be provided in message ^ext, at step 71 m rule based segmentation funcUon 34 as 
section 65 is shown in FIG 2 depicted in FIG. 3, initial segmentation of the data file may 
Data input window 45 also includes parameter name accomplished. For example, when dealing with a cus- 
section 66 that allows for associating a textual identifier, i.e., ^5 tomer database it may be desirable to segment customer 
a name, with each column of data in the input data file. In records into married and non-married customers. To do this, 
one embodiment of the present invention, a default name is a user simply defines a two branch segment to be created 
given to each column of data in the form of PARAMO ON, from the total population of records, the first branch on this 
where N is the Nth data column. Parameter names can be segment being defined by the rule "MARRIED-0" with the 
generated in at least two ways, either from a parameter name 40 second segment having all remaining non-married custom- 
file, which includes a name for each data parameter, or via ers. 

keyboard 20. An example of this is fllustrated in parameter Once the initial segmentation at step 71 is complete, 

name list section 66fl show in FIG. 2. In order to use a file expanding the initial segmentation is possible. This is a 

for the names of the parameters in the input file, use file simple process with rule based segmentation function 34 of 

checkbox 66b is selected, and by selecting browse button 4S data analysis system 10 of the present invention and is 

66c a dialogue window will be produced for selecting a file accomplished by moving to any point in an existing segment 

from a preexisting list of files. The selected file for the and either inserting or removing branches on that scgmen- 

parameter names for the input file appears in file name field tation level to a further level. Examples of further segmen- 

66d, tation in accordance with step 72 in FIG. 3 will be described 

Alternatively, parameter names for the input file may be 50 in discussions relating to FIG. 4. Once the further segmen- 

created with keyboard 20. In this alternate method, each tation at step 72 is complete, statistics on any parameter for 

default parameter name is selected in parameter name list all segments or for comparing the parameters distribution 

section 66fl and a new name may be entered in new name between any two segments may be viewed on display 28. 

field 66e with keyboard 20 and accepted by selecting replace Once the desired segmentation is complete, rule based 

button 66/ ' 55 segmentation function 34 is exited at step 73. 

Data input window 45 in FIG. 2 also preferably includes Rule based segmentation function 34 may be used for 

processing range section 67 that allows for specifying several purposes that include, for example, selectively par- 

whether the whole or only a portion of the input file is to be titioning the data for more manageable analyses, examining 

processed. To process the entire input file whole input file trends in the data, gaining an intuitive feel for the content of 

checkbox 67fl is selected. Alternatively, if only a portion of 60 the data, excluding rogue samples in the data from being 

the input file is to be processed, a start record may be input included in any predictive or clustering models, examining 

to start field 676 and an end record may be input to end field the results of neural clustering function 36, e.g., occupancy 

67c. To process to the end of the input file, end file checkbox / and profile of a particular set of clusters, and examining the 

67 d may be selected. output and distribution from neural prediction function 38. 

Using the input function of data acquisition and output 65 FIG. 4 shows an exemplary rule based segmentation 

function 40, a data file in database(s) 44 can be processed window 74. Window 74 preferably includes toolbar 80 

appropriately for further use with data analysis system 10. containing several buttons for providing predetermined 
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commands within rule based segmentation function 34. Rule 
based segmentation toolbar 80 includes initiate new input 
configuration button 82, open existing configuration button 
84, save data input configuration button 86, and save data 
input configuration as button 88. $ 

Rule based segmentation toolbar 80 includes run button 
90 and stop button 92. Toolbar 80 also preferably includes 
display summary file button 94 that may be selected to 
display a summary on a completed segmentation run on a 
given data set. Histogram plotter button 96 in toolbar 80 may 
be selected to prepare histograms for the segmentation 
configuration on a given data set. Also, rule based segmen- 
tation toolbar 80 includes exit button 98, which may be 
selected at any time to exit rule based segmentation function 
34. 

Rule based segmentation window 74 as shown in FIG. 4 
also includes segmentation results section 100 providing an 
example of a segmentation run on a data file. Section 100 
includes information on the segment number (Bin), segment 
test (Test Name), size of the segment (Size), percent of the 
total segment (%Total) percent of the Parent Segment 20 
(%Parent), mean for the segment (Mean), the segment's 
standard deviation (SD), minimum value for the segment 
(Min), and the maximum value for the segment (Max), In the 
example shown in FIG. 4, the data file includes 20,000 
records as indicated by the Size for All Records Bin 0. 25 

A file may be selected for segmentation with rule based 
segmentation 34 by selecting setup button 102 in window 
74. Selecting setup button 1CK2 activates rule based segmen- 
tation setup window 104 shown in FIG. 5. 

FIG. 5 shows an exemplary rule based segmentation setup 30 
window 104 for selecting a file for processing in rule based 
segmentation function 34. Window 104 allows for specify- 
ing the input file for processing in input file field 106. A 
listing of available data files may be accessed by selecting 
browse button 106fl. The portion or range of the file to be 35 
processed may be specified in range input field 108. Select 
button 109 may be chosen to select a range within the input 
file to be processed. 

Window 104 also preferably includes summary file 
checkbox 110, which may be selected to generate a summary 40 
file for the segmentation process. The summary file name 
may be input to summary file name field 110a, and a list of 
potential summary file names may be viewed by selecting 
browse button tlOb, Window 104 also includes produce 
output data files checkbox 111, which when selected, causes 45 
rule based segmentation function 34 to create output data 
files containing the results of the segmentation process. As 
shown in the example of FIG. 5, the file in input file field 106 
is "C:\ABaDATA.BDT," and the range for scanning this 
file in range input field 108 is the "Whole file." 50 

Dialogue box 104 in FIG. 5 also includes bin type 
selection 112. In segmenting the data in a data file with rule 
based segmentation function 34, the records in the data file 
are sorted into segments or bins. These bins may be exclu- 
sive or non-exclusive. The default bin type calls for exclu- 55 
sive bins. Exclusive bins are completely separate in terms of 
membership. Therefore, with exclusive bins, each member 
of a data file can only be in one bin. Alternatively, when 
using non-excliisive bins, individual members can occupy 
one or more bins. 60 

In rule based segmentation function 34, each level of 
segmentation may include an arbitrary number of bins. This 
is achieved by defining a logical test for each bin, except for 
the remainder bin, which contains all members that do not 
fall within the other specified bins. The remainder bin can be 65 
renamed at any time, saved to a file, or segmented further, 
however, preferably should not be deleted. 
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Several actions may he nerformed on bins in mle based 
segmentation function 34. For example, a name may be 
defined for each bin using a string of characters. A logical 
test for the bin may be specified. A bin may be saved to a file 
for use in later analysis. Also, bins at the same level may be 
added together, and another bin at the same level may be 
inserted before the current bin. Bins may be segmented 
further and a bin may be deleted along with all of its 
"dependent bins," except for the remainder bin. 

As previously noted rule based segmentation window 74 
includes segmentation results 100 showing the results of a 
segmentation configuration applied to a particular data set. 
The example segmentation shown in FIG. 4 includes 14 
bins. Bin 0 is the "All Records" bin and contains all 
members of the data set. All segmentation of the data set is 
therefore performed on Bin 0. The levels of segmentation 
are indicated in segmentation results 100 by bin number and 
test name with bins at the same level being tabbed over the 
same distance under Bin 0 All Records. Therefore, in the 
example shown in FIG. 4, the All Records bin was initially 
segmented into Bin 1 "Male" and Bin 8 "Female". From 
these initial segmentation levels, the male and female bins 
were further segmented into "Unmarried" and "Manied" 
bins, which in turn were each further segmented into 
"Young" and "Old" bins. 

For each bin in window 74 the number of members in that 
bin is shown in the "Size" column. The percentage the 
number of members in a particular bin represents with 
respect to the total number of records is shown in the 
"%Totar' column. For those bins having "parent-bins," i.e., 
all bins in the example of FIG. 4 except Bin 0 All Records, 
the percentage the bin represents with respect to its parent 
bin is shown in the "% Parent" column. For each bin the 
"Mean", standard deviation ("SD"), and minimum (MIN) 
and maximum (MAX) values are provided in segmentation 
results 100. 

Rule base segmentation window 74 also includes param- 
eter pop -up field 114, which may be used to select the 
parameter that segmentation results 100 is based on. 
Therefore, in the example of FIG. 4 the parameter "AGE" 
and its attendant statistics based on the specified segmenta- 
tion are shown in segmentation results 100. Parameter 
pop-up field 114 may be used to select the other parameters 
in the data set for performing a new segmentation on the data 
set. 

Rule based segmentation window 74 also preferably 
includes action buttons 116 that may be used to perform 
predetermined actions on the particular bin or bins of a 
segmentation. Once an existing bin is selected, selecting add 
button 118 allows adding another bin after the selected bin 
at the same level of the selected bin. Insert button 120 
performs the same function as add button 118 except that the 
new bin is introduced before the selected bin at the same 
level as the selected bin. Selecting segment button 122 
creates a "child-bin" at the next level of indentation from the 
selected bin. Activating remove button 126 removes the 
selected bin. 

Action buttons 116 in rule based segmentation window 74 
also include edit button 124, which may be used to edit a 
selected bin or segment. The add, insert, segment, and edit 
bin operations are all very similar, and an example window 
for editing a selected bin in response to the selection of edit 
button 124 is shown in FIG. 6 and is representative of the 
windows provided when add button 128, insert button 120, 
or segment button 122 are selected. 

FIG. 6 shows an exemplary edit bin window 128 that may 
be used for editing a bin used with rule based segmentation 
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function 34. Window 128 includes bin name field 130 as 
weii as parent bin name field 132. Therefore, in the example 
window shown in FIG, 6, the bin being edited is the bin 
named "ages 25-35" that is a child -bin of a bin named "high 
income." Edit bin window 128 in FIG. 6 also preferably 
includes bin test field 133, which shows the test for the bin 
being edited. Appropriately, for the bin named "age 25-35" 
the test in field 133 is "AGE>=25 & AGE<=35." A test in 
test field 133 may be validated by selecting vailidate test 
button 133o. 

Edit bin window 128 also includes available parameters 
section 134, which shows all the available parameters that 
may be used in defining or editing a bin. Once the edits to 
a bin are complete, OK button 135 may be selected to accept 
the specified edit. Alternatively, the edits to a particular bin 
may be canceled at any time by selecting cancel button 136. 

Also, edit bin window 128 includes output file checkbox 
138, which may be selected to output the contents of a 
particular bin to a file. The file name to which the bin's 
contents are to be output may be specified in output file 
name field 140. Alternatively, browse button 142 provides a 
predetermined list of files to which the segment's contents 
may be stored. 

In specifying the test for a bin using rule based segmen- 
tation function 34 of data analysis system 10, standard logic 
may be used. See, for example, the test shown in test field 
133 in FIG. 6. Table 1 below illustrates and example 
operator set for logical and relational operators for devel- 
oping tests for segmentation bins. Operators are shown in 
descending precedential order. 



TABLE 1 



OPERATOR 


FUNCTION 


( 


parenthesis 


) 


parenthesis 


< 


less than 


<= 


less ibaa or equal 


> 


greater than 


>= 


greater Chan or equal 




equal 


!- 


not equal 


1 


logical not 


& 


logical and 



FIG, 7 shows an exemplary parameter distribution win- 
dow 144 available with rule based segmentation function 34 
of data analysis system 10. Once the desired bins for a 
particular segmentation have been established and applied to 
a data set, a histogram plot of the data may be generated by 
rule based segmentation function 34 as shown in FIG. 7. By 
selecting histogram plot button 96 in toolbar 80 of rule based 
segmentation window 74, parameter distribution window 
144 shown in FIG. 7 is provided. 

FIG. 7 illustrates parameter distribution window 144 
having two histogram regions, including histogram region 
146 and histogram region 148. Window 144 also preferably 
includes parameter information 150 that includes parameter 
list 152, which provides a fist of parameters that may be 
plotted in the histograms of window 144 and that indicate by 
shading the name of the parameter that has been selected for 
depiction in histograms 146 and 148. In the example of FIG. 
7, the parameter "AGE" has been selected for plotting on the 
histograms. 

Parameter information region 150 also includes upper 
histogram information 154, lower histogram information 
156, and all records information 158. Histogram informa- 
tions 154 and 156 and all records information 158 in turn 
include information on the nmnber (t|) of records the aver- 
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age x» Ifie standard deviation (a) and the minimum (v) and 
maximum (x) value for the data set. In the example shown 
in FIG. 7, upper histogram 146 illustrates the distribution of 
members within a 20,000 member class having a minimum 
5 age of 18 and a maximum age of 94. Tlie lower histogram 
148 illustrates the distribution for the same data set but 
having a minimum age of 60 and a maximum age of 94. This 
different minimum age accounts for the difference between 
histograms 146 and 148 and histogram informations 154 and 

10 

Histograms 146 and 148 in parameter distribution win- 
dow 144 can be plotted as actual values or percentage values 
by an appropriate selection in values regions 160. Also, 
histograms 146 and 148 may be plotted with cumulative or 
non-cumulative distribution by an appropriate selection in 

15 distribution regions 162. 

Histograms 146 and 148 in window 144 may be plotted 
and printed using color coding as well as shading as shown 
in FIG. 7. Histograms 146 and 148 can be saved as bit maps 
so that they may be imported into other programs, e.g., word 

20 processing, graphics, or spreadsheet programs. 

FIG. 8 provides an exemplary neural clustering function 
36 in accordance with the present invention. Data analysis 
system 10 preferably uses unsupervised learning neural 
network techniques to cluster data from a data file or from 

25 specific segments in a data file to identify groups with 
similar characteristics. Neural clustering function 36 also 
provides a generic profiling capability. Neural clustering 
function 36 is different from the segmentation provided with 
rule based segmentation function 34 in that the clustering is 

30 based entirely on the statistics of the data rather than 
specified logic. This contrasting analysis of a data set 
provides for alternate views of the data set. The results of 
neural clustering function 36 may be displayed graphically 
on display 28 in an easy to understand format. Information 

35 on the distribution of parameters for records in a particular 
cluster may be viewed and relationships between parameters 
may be identified. Neural clustering function 36 may also be 
used to identify unusual data so that it may be examined in 
more detail. 

40 Neural clustering function 36 begins at step 164 when, for 
example, clxistering buttons 49 in main toolbar 46 described 
in discussions relating to FIG. 2 are selected. At step 166 a 
cluster setup is defined. This may be accomplished by 
defining the fields in the clustering process. For example, 

45 when the data file contains customer information, the cluster 
parameters may be age, gender, income, or whatever param- 
eter is desirable. Also, the maximum number of clusters 
should be defined at step 166. The number of clusters is 
preferably a square number, e.g., 3x3, 4x4, 5x5, .... 

50 The next step in neural clustering function 36, as repre- 
sented in FIG. 8, involves initiaUzing the cluster map at step 
168. Before the actual clustering process may commence, 
neural clustering function 36 preferably automatically pre- 
pares a random set of "generic records'* and assigns one 

55 record to represent each cluster. Continuing the customer 
data example, these "generic records" would be "generic 
customers." Each "generic record" has a set of cluster 
parameters that arc randomly generated. Neural clustering 
function 36 generates the clusters randomly in a two- 

60 dimensional grid or cluster map as shov^m in FIG. 9Athat is 
an 8x8 cluster map. Cluster 1 and Cluster 2 have been 
randomly identified in cluster map 170 for a particular 
(undefined) set of parameters. It is noted that neural clus- 
tering function 36 is not hmited to using an 8x8 map as 

65 shown in FIGS. 9A and 9B. Maps of other sizes may be used 
without departing from the spirit and scope of the present 
invention. 
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Returning to neural clustering function 36 in FIG. 8, at 
step 172 tlic clustering process is started. Jn starting the 
clustering process, neural clustering function 36 takes the 
first record in the database, identifies the generic cluster that 
is most similar to that record and then modifies that cluster s 
to provide a closer match to the actual record. In one 
embodiment of neural clustering function 36, an EucUdian 
distance is used as the matching metric between an input 
record and a "generic record" cluster. Neural clustering 
function 36 also modifies the clusters in the immediate lO 
neighborhood to the chosen cluster to make them more 
similar to the chosen cluster. This process is illustrated in 
FIG. 9B in cluster map 176. 

Returning to FIG. 8, at step 174 in neural clustering 
function 36 the clustering process is completed. In doing so, 15 
function 36 takes each successive record, identifies the best 
match in the cluster map, and, as before, modifies the cluster 
in its immediate neighborhood, as shown in FIG. 9B. This 
process may be repeated a number of times on the total data 
set with the degree of modification and size of neighborhood 20 
modified in the cluster map gradually reducing as the 
training proceeds. The result of this is that neural clustering 
function 36 initially coarsely separates the customers into 
major groups in different parts of the cluster map. Function 
34 then progressively defines these major groups into further 25 
groups having more subtle distinct characteristics. 

Neural clustering fianction 36 may also have a self- 
organizing capability in that, at the end of the clustering 
process, two clusters next to each other on a cluster map will 
have a high degree of similarity, while clusters on totally 30 
different parts of the map will be quite different. It should 
also be noted that a cluster map has no edges, and that the 
cluster cell on the top edge of the map is actually adjacent 
to one on the bottom edge, and the same is true for cells on 
the left and right edges of the cluster map. The cluster map 35 
is therefore a torroidal surface. 

Returning to FIG. 8, the next step in neural clustering 
function 36 is step 178 where the results of the clustering 
process are analyzed. Once the clustering process is com- 
plete at step 174 a variety of analysis results are available 40 
with neural clustering function 36. One type of result 
available is cluster occupancy. This indicates the number of 
members in each cluster. This may be presented by color 
coding the cluster map with, for example, red clusters 
denoting high occupancy (large number of members) and 45 
blue clusters having low occupancy (small number of 
members). 

Another type of result available with neural clustering 
function 36 is the mean, standard deviation, and minimum 
and maximum values for any parameter for each cluster. 50 
This may also be accomplished by color coding a cluster 
map. By selecting different parameters for display on a 
color-coded cluster map, changes in the color of the cluster 
map as the selected parameter changes allows for visualizing 
the distribution of members for each parameter. 55 

The next level of results preferably available with neural 
clustering function 36 allows for viewing the mean, mini- 
mum and maximum values for any selected parameter for a 
single cluster and to compare these values with population 
averages. The parameters may also be ranked in terms of 60 
mean value. Also, a view of the complete distribution of any 
parameter for any cluster may be provided, and distributions 
between clusters or between one cluster and the total popu- 
lation may be compared. 

Once analyze step 178 is complete, the clustering process 65 
may be refined at step 180. In the customer database 
example, refining the analysis of the clusters may provide 



information about the customers, the major ciistomer 
groups, the profiles of those customers, the differentiating 
factors of the groups, and any significant associations 
between the defining factors of each group. Based on this it 
may be desirable to recluster the data, possibly using a 
different set of variables, a coarser or finer map, or using a 
subset of the original data. 

The next step in neural clustering function 36 as repre- 
sented in FIG. 8 is tagging step 182. One of the goals of 
neural clustering function 36 is to produce a set of statistical 
significant record groups, each with an intuitively soimd 
profile but with exploitable behavioral characteristics. In the 
customer database example, clustering on demographic 
information such as age, income, gender, occupation, time in 
residence, etc., will produce a set of customers with similar 
demographic profiles. It is invariably the case, however, that 
these customer groups will have different lifestyle, attitude, 
and behavioral characteristics. In particular, certain clusters 
may have significantly higher than average propensity to 
respond to direct marketing material for a specific product. 
This can be exploited by ranking the clusters in terms of 
"propensity to respond" in targeting individuals whose pro- 
files match the highest scoring groups. This may be accom- 
plished at tagging step 182 of neural clustering function 36. 
Once tagging step 182 is complete, neural clustering func- 
tion 36 may be exited at step 183. 

FIG. 10 shows an exemplary run neural clustering win- 
dow 184 for use with neural clustering function 36 in 
accordance with the present invention. Window 184 prefer- 
ably includes toolbar 186, data and configurations section 
188, and detailed run information section 190. Toolbar 186 
includes initiate fresh network configuration button 192, 
retrieve network configuration button 194, save data input 
configuration button 196, and save data input configuration 
as button 198. Toolbar 186 also includes mn clustering 
process button 200 and stop clustering process button 202. 
Additionally, toolbar 186 includes exit button 204 for exiting 
neural clustering function 36. 

Data and configuration section 188 in run neural cluster- 
ing window 184 provides information on each clustering 
run. An existing configuration file may be loaded or modi- 
fied using retrieve network configuration button 194 in 
toolbar 186. In the example show in FIG. 10, file 
"DEF.CNC is being processed with neural clustering func- 
tion 36. Data and configurations section 188 indicates that 
input file "DATA-EDT" is being used for the clustering 
process, that the first "Run" has been completed and that the 
clustering mode is clustering ("C) (training) mode. Also, in 
the example of FIG, 10 the "Input Weights for Training" is 
"none" indicating that the mn is not a continuation of 
previous clustering sessions. Data and configurations section 
188 also includes information on the "Weights File" for a 
given clustering process, and in the example shown in FIG. 
10 the *^eights File" is "DEF.BKW." This file is used for 
weighting purposes during a clustering mn. 

A new clustering configuration may be created with run 
neural clustering window 184 xising add button 206, edit 
button 208, remove button 210, and copy button 212 in data 
and configurations section 188. If an existing configuration 
is not used for generating a new run, then an entirely new 
clustering specification must be generated. Neural clustering 
fiinction 36 provides an appropriate window for setting up a 
new mn via add button 206. 

FIG. 11 illustrates an exemplary clustering setup window 
214 for setting up clustering parameters in accordance with 
neural clustering function 36. Clustering setup window 214 
preferably includes files and ranges section 216, output 
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parameter section 218, parameter aormalization section 220, 
paramcicr sdieciion section 222, and training setup section 
224. 

Inputs to the various fields in files and ranges section 216 
are available through browse and select buttons 226. Each S 
browse button pulls up a dialogue box having a list of files 
for selection. The select button pulls up a dialogue box that 
offers choices as to the range of processing for a file. The 
processing range may be either the entire file or a specified 
start and end location. lO 

Output parameter section 218 includes recall only check- 
box 228 and produce lagged data file checkbox 230. When 
recall only checkbox 228 is selected, neural clustering 
function 36 mns in "recall mode", i.e., using a cluster model 
created earlier Anew set of data records may then be applied 15 
to an existing cluster map to see how the records correlate 
with an existing model. Additionally, weights in checkbox 
229 in files and ranges section 216 should be checked when 
the clustering process is run in recall mode. 

Selecting produce tagged data file checkbox 230 in output 20 
parameter section 218 causes neural clustering function 36 
to produce a data file containing all the original information 
in the data file together with an additional field containing 
the cluster identification for each record. Even when recall 
only checkbox 228 is not selected, neural clustering function 25 
36 performs a single recall run and at the end of the allotted 
number of training cycles to determine the cluster for each 
record. This corresponding data file may then be used in rule 
based segmentation function 34 to select particular clusters 
that have desirable features. 30 

Additionally, whenever a neural clustering run is 
performed, neural clustering function 36 may produce a 
textual summary file containing, for example, cluster versus 
parameter information. The file typically has a header fol- 
lowed by four main sections containing mean, standard 35 
deviation, and minimum and maximum data, respectively. 
Also each row in the file may contain a "used'' flag indi- 
cating whether the parameter was used as an input to the 
clustering process, followed by the mean value (or the 
standard deviation, etc.), for each cluster. The file may be 40 
single comma delimited, and the numbers in the file may be 
output to six significant figures. The file should also pref- 
erably be formatted so that it can be easily read into other 
applications such as, for example, spreadsheet applications. 

Histogram output section 232 in clustering setup window 45 
214 provides a checkbox for the creation of histograms for 
each parameter in each cluster. This information is calcu- 
lated during the clustering run and stored in a separate file 
so that during analysis the distribution of parameters across 
the cluster map may be viewed and analyzed in a graphical so 
manner. 

Parameter normalization section 220 in clustering setup 
window 214 preferably provides three parameter normal- 
ization options. Normalization rescales the parameters so 
that they all have the same dynamic range, e.g., minimum or 55 
maximum values. If the data is not normalized prior to 
processing, and some parameters have a large range of 
values, these values may dominate the processing and pro- 
duce erroneous results. 

One of the normalization options in section 220 is a "use 60 
mean and standard (std.) deviation (dev.)." When this 
parameter normalization is selected, normalization of a 
parameter involves subtracting the mean firom the parameter 
and dividing it by the standard deviation for the parameter. 
Therefore, if the parameters are normally distributed, the 65 
variables will then become distributed with a mean of 0 and 
variance of 1. This normalization option is generally rec- 
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ommended if, for example, two parameters vary equally^ hut 
over different ranges. This option will then normalize the 
distributions so that the parameters vary over the same 
range. 

Also within parameter normalization section 220 is a 
"tiser defined offset and gain" normalization option. Select- 
ing this type of normalization allows for using prior infor- 
mation to weight a particular parameter more strongly or 
weakly when the emphasis of the parameter is known. 
Parameter normalization 220 also includes a "neither" nor- 
malization option. Selecting this option prevents normahza- 
tion of the data. This option may be useful if one or two of 
the parameters are known to dominate the data compared 
with all others. In this case, attention is paid to the lesser 
parameters or possibly excludes them all together. 

Unit normalized vectors checkbox 234 in clustering setup 
window 214 may be selected to normalize all data to lie in 
a unit sphere. This is similar to the "use mean and standard 
deviation" option in parameter normalization section 220 
except that the parameters are normalized all at once rather 
than on a per parameter basis. This option may be useful 
when it is suspected that a nimiber of rogue data points exist 
within the data set. If some parameters have much greater 
variability than others, however, they will become the domi- 
nant factors in the clustering process. 

Parameter selection section 222 in an exemplary cluster- 
ing setup window 214 of FIG. 11 shows general information 
on the parameters that may be used in the clustering process. 
Section 222 includes parameters available field 222a that 
indicates the number of fields in the chosen data set and 
parameters selected field 222b that indicates the number of 
parameters selected for the current clustering run. Param- 
eters for clustering may be selected by choosing select 
parameters for clustering button 222c that provides access to 
a parameter selection window. 

FIG. 12 illustrates an exemplary parameter selection 
window 236 for selecting a parameter for use in neural 
clustering function 36 of data analysis system 10 of the 
present invention. Parameter selection window 236 prefer- 
ably includes data file information section 238, which 
includes a data file name field, a parameters available field 
for the selected data file, and a parameters selected field for 
specifying the number of parameters to be used in a clus- 
tering run. 

Parameter selection window 236 also preferably includes 
available parameters section 240, which includes a list of all 
available parameters in the specified data set. Selected 
parameters section 242 in parameter selection window 236 
includes the names of all the parameters that have been 
selected for the current neural clustering run. Using include 
button 244, remove button 246, include all button 248, and 
remove all button 250, parameters may be moved between 
available parameters section 240 and selected parameters 
242 as desired for a given neural clustering run. 

Once the parameter selection is complete, parameter 
selection window 236 may be closed by selecting OK button 
251. Alternatively, a parameter selection process may be 
cancelled at any time by selecting cancel button 253 in 
window 236. 

Returning to FIG. 11, training setup section 224 in clus- 
tering setup window 214 may be used to define the size of 
a cluster map as well as the number of training cycles for a 
cluster run. Section 224 accordingly includes map width 
input field 252 for specifying the number of clusters along 
one edge of a cluster map. The total number of clusters is 
preferably the square of the number. A default for the 
number of clusters may be set at, for example, four and a 
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maximum number of clusters may be limited to, for 
cAampie, thirty. Number of training cycles input field 254 
may be used to input and display the number of complete 
runs through a data file by neural clustering function 36. 
Therefore, if the data consisted of, for example, 10,000 5 
items and training cycles input field 254 is ten, then 10 times 
10,000 equals 100,000 passes through the clustering net- 
work. A default number of training cycles may be set at, for 
example, 10 cycles. 

Training setup section 224 in cltistering setup window 214 lO 
in FIG. 11 also preferably includes advanced clustering 
configuration button 256, Selecting button 256 provides 
access to advanced clustering configuration section 258 in 
window 214. Section 258 includes initial update neighbor- 
hood size input field 260 that is the radius that should be set 15 
to approximately 30% to 40% of the total map dimensions, 
e.g., for an 8x8 cluster map input field 260 should be set to 
3 or 4. Values imder 20% of the map size in initial update 
neighborhood size input field 260 could lead to the possi- 
bility of unused clusters within the map unless the map is 20 
further trained in a later session. A default value for field 260 
may be set to, for example, 1.9. 

Final update neighborhood size input field 262 of 
advanced clustering configuration section 258 should pref- 
erably be set between zero and one. If field 262 is set to zero, 25 
then every cluster has a possibility of becoming a cluster 
center. If set to one, there is a possibility of a lesser 
distinction between adjacent cluster. A default for field 262 
is, for example, 0.5. Weight update factor input fields 264 
include an initial weight update factor field and a final 30 
weight update input field. The weight update factors deter- 
mine how fast the network adapts to each new example. A 
large initial weight update factor is used to quickly establish 
the network cluster structure. The final weight update factor 
is used at the end of the training process to further define the 35 
clustering structure. A default initial weight update factor of, 
for example, 0.9 and a default final factor of, for example, 
0.1 may be suitable. 

Also within advanced clustering configuration section 
258 is randomize training data checkbox 266. Checkbox 266 40 
may be selected to help avoid the possibility that some 
artificial clusters may form. Training times will increase, 
however, when checkbox 266 is selected. Advanced cluster 
configuration section 258 also preferably includes force 
activation update checkbox 268. Selecting checkbox 268 45 
allows the cells that may have been frozen out earlier in the 
clustering process, i.e., are empty, to take part in the clus- 
tering process again. So, for example, if due to a random 
effect most of the clusters are forming in an 8x3 region of 
an 8x8 map, by selecting force activation update checkbox 50 
268, neural clustering function 36 will allow the clusters in 
the 8x3 region to spread out again and make full use of the 
8x8 cluster map. A default value for checkbox 268 is, for 
example, not to force activation update. 

Once all of the setup information is input into neural 5S 
clustering setup window 214, then OK button 270 may be 
selected to initiate a clustering run. Alternatively, a cluster- 
ing setup may be cancelled at any time by selecting cancel 
button 272. 

Once a neural clustering run has been made within neural 60 
clustering function 36 various analyses of the results is 
possible. Neural clustering function 36 allows for segment- 
ing the data set in terms of similarity to a set of user defined 
criteria. For example, in a customer marketing application, 
the data set may include information on customer socio- 65 
demographics, such as age, income, occupation, and lif- 
estyle interests. Neural clustering function 36 within data 



analysis system 10 may be used to select a siib5ait nf these 
parameters and cluster on them to determine whether they 
fall into natural groupings that allow for more selective, 
personalized marketing. 

Continuing the customer marketing example, the present 
system*s clustering analysis capability provides a mecha- 
nism for generating a better understanding of who the 
customers are and how they behave. Clustering function 36 
can be used to identify the most commonly occurring 
customer types and also the more unusual customers. It may 
also be used to identify the discriminating characteristics of 
individuals who buy particular products or services, or that 
behave in a particular way. Using neural clustering function 
36, previously unknown relationships can be uncovered in 
data and expected relationships may be verified quickly and 
easily. 

FIG. 13 illustrates an exemplary clustering analysis win- 
dow 274. Window 274 preferably includes toolbar 276, 
cluster map 278, parameter statistics information 280, and 
parameter graphs 282. Toolbar 276 includes initiate new 
input data configuration button 284, open new input data 
configuration button 286, save data input configuration 
button 288, and save data input configuration as button 290. 
Select results button 292 in toolbar 276 may be used to select 
a particular results file for further analysis in clustering 
analysis window 274. 

Occupancy (-ri) button 294 in toolbar 276 may be selected 
to view the number of members in each cluster. Mean or 
average-value ^ button 296 may be selected to view the 
average value of a currently selected value for each cluster. 
Standard deviation (a) button 298 may be seleaed to view 
the standard deviation for a currently selected parameter for 
each cluster. Minimum value (x) button 300 and maximum 
value (x) button 302 may be used to view the minimum and 
maximum value of a currently selected parameter for each 
cluster, respectively. 

Also, preferably contained in toolbar 276 is current 
parameter selection field 304 that displays the name of the 
current parameter selection. Initiate histogram plotter button 
306 in toolbar 276 may be selected to initiate the plotting of 
a histogram while exit button 308 in toolbar 276 may be 
selected at any time to exit the clustering analysis portion of 
neural clustering function 36. 

In one embodiment of the present invention, cluster map 
278 is displayed as a multi-hued square grid. Alternatively, 
appropriate grey -scaling^ or shading, as shown in FIG. 13, 
may be employed for cluster map 278. Each cell in cluster 
map 278 may be color-coded representing minimum and 
maximum values for the selected , parameter. Unoccupied 
cells may be displayed in gray, and cells of similar color 
aggregate to form a cluster. Cluster map 278 is a continuous 
surface and there are no edges to the map. This means that 
if a cluster forms on the bottom edge and there is a similar 
cluster directly above the top edge then these cells prefer- 
ably aggregate to form the same cluster. The user may 
interact vnih the cluster map via pointing device 18. Box 310 
denotes the current cluster cell selection. 

Directly adjacent to clustering map 278 in window 274 of 
FIG. 13 are summary statistics 312. Summary statistics 312 
include occupancy, mean value, standard deviation, and 
minimum and maximum values for the currently selected 
parameter in the selected cell as well as the currently 
selected parameter with respect to the whole cluster map. 

When a new cluster map is loaded for analysis via 
window 274, the default statistic for the map is occupancy, 
i.e., the number of members in each cell. By selecting 
occupancy button 294, standard deviation button 298, mini- 
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mum value button 300, or maximum value button 302 in 
toolbar 276, ihe appropriate statistics will display in sum- 
mary statistics 312. In the example shown in FIG. 13, the 
currently selected parameter is readership for the Daily 
Mirror as represented by parameter name "MIRROR" in 
parameter field 304. According to the example presented in 
FIG. 13, approximately 18% of the occupants in the cell 
selected with box 310, cluster number 10 as identified in the 
"Cluster" field of summary statistics 312, read the Daily 
Mirror, compared with the average of around 8% for the 
whole map. 

Parameter statistics information 280 in clustering analysis 
window 274 preferably displays the mean, standard 
deviation, and minimum and maximum value for all avail- 
able parameters, together with a flag ("Y") under the input 
column indicating whether the parameter was used during 
the clustering process. Using pointing device 18, a param- 
eter in parameter statistics information 280 may be selected 
and plotted in parameter graphs 282. 

Parameter graphs 282 display a graph of the currently 
selected parameters. Each parameter is plotted on a hori- 
zontal scale, normalized from zero to one. Downward facing 
triangles 314 denote the average value of the parameter for 
the selected cell, compared with the average value of the 
parameter for the population as a whole that is represented 
by upward facing triangles 316. 

Parameter graphs 282 are controlled with control buttons 
318. Buttons 318 include all button 320 that when selected 
causes all parameters for the selected cell to be displayed in 
parameter graphs 282. All inputs button 322 may be selected 
to display a graph of all cluster inputs for a selected cluster 
cell. Selecting none button 324 prevents parameter graphs 
282 from being displayed. Also, control buttons 318 include 
options button 326 that pulls up a graph options dialogue 
box. 

FIG. 14 illustrates exemplary graph options dialogue box 
328 that may be used to modify the graphs displayed in 
parameter graphs 282 in clustering analysis window 274 
when options button 326 is selected. Using dialogue box 328 
the graphs in cluster analysis window 274 may be modified 
as desired. By selecting none option 330, which may be the 
default option, the parameters in parameter graphs 282 will 
be presented in the order that the parameters are stored in the 
data set. Selecting cluster mean option 332 allows the 
parameters to be displayed in the order of maximum mean 
value. By selecting cluster and population mean difference 
option 334, the parameters within a cell that vary most 
significantly from the norm, i.e., the overall population, are 
displayed. The parameters will then be ranked and presented 
in terms of maximum positive deviation through to maxi- 
mum negative deviation. Also, selecting cluster and popu- 
lation mean absolute difference option 336 allows the 
parameters to be ranked in absolute terms, i.e., ranked in 
terms of absolute variation from the population mean. 

Use labels checkbox 338 in graph options dialogue check- 
box 328 is typically checked as a default option. When not 
checked, parameter graphs 282 will be drawn without the 
spread range bars, i.e., with just triangles 314 and 316. Also 
when checkbox 338 is not checked, parameter graphs 282 
will be scaled according to the range of the largest parameter 
in the set. Once the options in dialogue box 328 are 
specified, box 328 may be closed by selecting OK button 
337. Changing the graph options may be canceled at any 
time by selecting cancel button 325. 

Returning to FIG. 13, a histogram for any cell or group of 
cells may be initiated by selecting histogram button 306 in 
toolbar 276 in clustering analysis window 274. Selecting 
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histogram button 306 presents parameter distrihutinn win- 
dow 340, and an example of which is shovm in FIG. IS, 
Window 340 allows for exploring the distribution of indi- 
vidual parameters across one or more cluster cells. 

Parameter distribution window 340 of FIG, 15 preferably 
includes upper histogram 342 and lower histogram 344. 
Window 340 also includes parameter list section 345 that 
may be used to select the parameter for plotting in the 
histograms. Histogram information section 346 includes 
cluster number and occupancy field 346fl, occupancy field 
345/?, mean field 346c, standard deviation field 347^/, and 
minimum value 347e and maximum value 348/ fields for 
both histograms and the whole data set. 

In the example shown in FIG, 15, the distribution of the 
parameter AGE selected in parameter list section 345 is in 
top histogram 342 for cluster cell nimiber "2" and lower 
histogram 344 for cluster cell number "3". Summary statis- 
tics 346, e.g., occupancy, mean, standard deviation, and 
minimum and maximum values, arc also provided for upper 
342 and lower 344 histograms compared to the data set as a 
whole. Histograms 342 and 344 in window 340 may be 
displayed as actual values, which is the case in the example 
of FIG. 15. Alternatively, histograms 342 and 344 can be 
displayed as percentage values by making appropriate selec- 
tions in values sections 347 of window 340, 

Copy Hist 1 button 348 and Copy Hist 2 button 350 copies 
the associated histogram to a clipboard in order that the 
histograms can be imported as bit maps to an appropriate 
application, for example, a word processing application. 
Once copied to the clipboard, the histograms may then be 
pasted into a word processing document. 

Neural prediction function 38 of data analysis system 10 
of the present invention provides predictive modeUng capa- 
bility. This capability may be particularly beneficial in a 
customer analysis setting in predicting future behavior of 
current or protective customers by learning from actual 
customer behavior. Neural prediction function 38 utilizes 
supervised learning neural network technology having the 
capabiUty to leara from historical behavior stored in 
dalabase(s) 44. This technique may be used to predict any 
aspect of behavior for which records of historical behavior 
are stored in database(s) 44. For customer databases, this 
behavior may include product preference, customer 
profitability, credit risk, and likelihood of fraud. In imple- 
menting a direct marketing campaign, for example, neural 
prediction function 38 may be used to analyze records of 
individuals who did and did not respond to marketing 
campaigns. Function 38 may be used to score prospect lists 
to identify those individuals most likely to respond to a 
future marketing campaign. 

Neural computing is an advanced statistical data process- 
ing technique. Unlike conventional techniques that require 
programming with complex mles and algorithms, neural 
networks develop their own solutions to problems by learn- 
ing from examples taken from the real world. For suitable 
applications, the technique can provide exceptional benefits 
in the ability to rapidly develop effective, computationally 
efi&cient solutions to complex data processing problems. 

One embodiment of neural prediction function 38 of data 
analysis system 10 of the present invention uses a type of 
supervised learning neural network known as "multi-layer 
perception" (MLP) network. MLP comprises a large number 
of simple interconnecting processing elements (neurons) 
arranged in three layers. Each neuron within the architecture 
combines the weighted outputs from the neurons in the 
previous layers, passes this through a non-linear transfer 
function, and feeds the results on to the neurons in the next 
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layer The input neurons take in the input information, while accuracy of the network and also provide information on the 

the output neurons provide the prediction. In the customer correlation between record profile and prediction accuracy, 

data set example, the input neurons receive customer infor- At step 376 in neural prediction function 38 in FIG. 17 an 

mation stored in a database and the output neurons produce iterative refinement of the neural prediction process may 

the customer behavior prediction. 5 take place. As previously stated, this may involve using rule 

FIG. 16 A illustrates MLP neural prediction network 352. based segmentation function 34 to compare the predictive 
Network 352 comprises three levels of neurons 354, includ- performance between the training data set and the test data 
ing first level 356, second level 358, and third level 360. set. Significantly better predictive performance on the train- 
Each neuron in a given level connects to every neuron in the ing set may be indicative of over- learning and poor model 
level below and above it, where appropriate, by intercon- lO generalization in the network. The objective of training steps 
nections 362. 370 and 372 is to produce a model that accurately reflects the 

FIG. 16B represents the functionality of each neuron 354 complex interrelationships between the input and output 

in neural network 352. Each neuron combines the weighted parameters yet is sufficiently simple to be generic. The 

outputs from the neuron in the previous layer, passes it trade-off between accuracy and generalism is controlled via 

through a non-linear transfer function, and feeds the results 15 the number of input parameters chosen and their encoding 

to the next layer of neurons. schemes. As with any statistical modeling system, the model 

Because neural network 352 represented in FIGS. 16A predictions must be carefully examined as well as the 

and 16B uses a non-linear processing element as the fun- sensitivity of the predictions to each input parameter The 

damental building block of system neuron 354 it is capable model setup may be interactively refined to reproduce the 

of modeling complex non-linear relationships. Also, the 20 required performance, 

weights on interconnections 362 (not explicitly shown) Once step 376 is complete, neural prediction function 38 

determine the nature of the predictions made by neural may be exited. The steps of neural prediction function 38 

network 352, These weights are defined during a "training** have now generated a predictive network that may be used 

process with the system of weights effectively representing to predict expected behavior from a data set. 

"knowledge" derived from the training data. 25 FIG. 18 illustrates an exemplary neural network file 

FIG. 17 illustrates an exemplary flowchart for neural specification window 378, which in one embodiment of the 

prediction function 38 of data analysis system 10 of the present invention is the main window for neural prediction 

present invention. Neural prediction function 38 begins at function 38. Window 378 preferably includes toolbar 380. 

step 364 whenever one of prediction buttons 50 in main Toolbar 380 includes initiate new network configuration 

toolbar 45 is selected (see FIG. 2), At step 366 in neural 30 button 382, retrieve network configuration button 384, save 

prediction function 38 a predictive model setup is defined. data input configuration button 386, and save data input 

Step 366 essentiaUy involves defining the parameters are to configuration as button 388. Toolbar 380 also includes run 

be predicted. In the customer database example, the param- button 390 for creating a file specification and stop button 

eters to be predicted may include, for example, mail 392 for stopping a neural network file specification process, 

responsiveness, credit risk, profitability, etc. Also at step 36 6 35 Exit button 394 in toolbar 380 closes neural prediction 

the parameters that the predictions are to be based on are function 38. 

specified, e.g., age, income, etc. Additionally, at step 366 the Neural network file specification window 378 also pref- 

data may be divided into two groups: a training set for use erably includes ipout data files section 396. Section 396 

in developing the models and an independent test set for use allows for browsing and selectmg from available data files 

in testing the predictive capability. 40 in a given directory a data file for use with neural prediction 

At initialize predictive network step 368 in neural pre- function 38. Raw binary files may be selected via select files 

diction function 38, a random set of network weights for button 398, which pulls up an appropriate input file selection 

interconnections 362 is generated. Next, at step 370 the dialogue box. 

training process is started. In start training process step 370 FIG. 19 shows an exemplary input file selection dialogue 

neural prediction function 38 takes the first record and enters 45 box 400 for selecting a file for processing with neural 

the appropriate information into the neural network's input prediction function 38 and is exemplary of a dialogue box 

neurons (neuron level 356 in FIG. 16 A). Because these available when select files button 398 in window 378 is 

initial weights are randomly chosen, the network's initial selected. Dialogue box 400 preferably includes available 

output prediction is random. This initial prediction is com- files section 402 and selected files section 404. Using 

pared to known historical behavior for that record and a 50 include button 406, remove button 408, and replace button 

training algorithm is used to alter the weights on intercon- 410, files may be moved from available files section 402 to 

nections 362 so that the next time a similar record is selected files section 404, and vice versa. Once the appro - 

presented to the network its prediction will be closer to priate files are selected for processing, done button 412 

known historical behavior closes dialogue box 400. Alternatively, the file selection 

At step 372 the training process is completed. This ss process may be canceled at any time by selecting cancel 

involves repeating start training process step 370 a number button 414. Access to additional directories for selecting 

of times for all records in the training set. As neural files may be obtained by selecting directory button 416. 

prediction function 38 goes through the training process, the In one embodiment of the present invention, using input 

prediction will gradually get closer and closer to actual file selection dialogue box 400 up to five input files may be 

values. 60 selected for processing. These files may be of the same 

At step 374 the results of training process steps 370 and length and have disjoint parameters. It is preferred, however, 

372 may be tested. The neural network's predictive capa- that a single file be created from the several files using the 

bility is tested on a test sample data set. The goal at step 376 data merge capability of the present invention described in 

is to see how well the system predicts the known behavior discussions relating to FIGS. 29 and 30. 

without prior knowledge of that behavior One way to test 65 Returning to FIG. 18, neural network file specification 

this prediction is to feed the prediction into rule based window 378 also preferably includes parameter selection 

segmentation function 34 to analyze the overall prediction region 418. In one embodiment of neural prediction function 
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38 of the present invention, the maximum number of input maximum input value is associated with an activation of 
parameters to a neural network is tbrty. It is noted that the one. The parameter value is then encoded by assigning an 
maximum number of input parameters may be varied with- activation Unearly between these limits. Otherwise, the first 
out departing from the spirit and scope of the present neuron is assigned a spot value corresponding to the mini- 
invention. Selecting the parameters for a model is accom- 5 mum value and the last neuron the maximum value. The 
plished by choosing select parameters button 420 that pro- intermediate neurons are assigned spot values linearly from 
vides access to a window for specifying parameters. the minimum to the maximum values. The activation values 
FIG. 20 illustrates exemplary specify parameters window for these neurons are then assigned so that the dot product 
422 for use in selecting the input and output parameters for between the vector of activation values and the vector of 
a neural network and is provided when select parameters lO spot values equals the parameter value. Furthermore, the 
button 420 in window 378 is selected. Window 422 prefer- sum of the activation values is equal to one and no more than 
ably includes available parameters section 424, which pro- two neurons are activated. If a parameter value lies outside 
vides a list of the parameters available for a data file. the encoding range then it is encoded as if it was the 
Window 422 also includes input parameters section 426 for minimum value if it is less than the minimum value or 
selecting the input parameters for a neural prediction net- 15 encoded as the maximum value if it is greater than the 
work. Input parameters section 426 preferably includes maximum value. A typical example of a parameter that may 
parameter name section 428 showing the names of the input be spread encoded is a person's age, where a distribution is 
parameters and the scheme for the encoding of each required for thresholding purposes. 

parameter, which will be described in discussions relating to Clock spread encoding is similar to spread encoding 

FIG. 21. Input parameters section 426 also has include 20 except that the first neuron is assigned two spot values (the 

button 430 for adding a selected parameter in available minimum and maximum), and the last neuron is viewed as 

parameters section 424 to parameter name section 428 in being adjacent to the first neuron in a circular fashion. This 

input parameters section 426. Remove button 432 removes method is useful for encoding times, angles, etc., because it 

a selected parameter from parameter name section 428. gives a smooth transition and activation values when the 

Encode button 434 in input parameters section 426 may be 25 parameter of value goes full circle. 

selected to provide the encoding scheme for an input param- In one-in-N encoding, N neurons are defined. Only one 

eter. neuron is given an activation value of one with the rest of the 

For each network input parameter, an encoding scheme neurons receiving an activation value of zero. If a parameter 

must be specified. In one embodiment of neural prediction value is the minimum value, then the first neuron ts 

function 38 of data analysis system 10, three different 30 activated, if it is minimum value plus one, then the second 

encoding schemes are supported and will be described in is activated, etc. If the input parameter value is less than the 

discussions relating to FIG. 21. Also, the minimum and minimum value then the first neuron is activated. If the 

maximum values for each input should be specified along parameter value is greater than the maximum value then the 

with the number of neurons over which the input value is to last neuron is activated. The number of neurons must be the 

be encoded. The minimum and maximum values are typi- 35 maximum parameter value minus the minimum parameter 

cally the minimum and maximum values of the input value plus one. 

parameter. Exceptions may be necessary when the parameter Additionally, in one-in-N encoding, each neuron corre- 

has a long-tailed distribution in which case some other value sponds to a class. To encode a parameter value, a class for 

may be selected, e.g., +/-4 standard deviation s. Values of each parameter is determined and the corresponding neuron 

the input parameter greater than the maximum specified 40 is given an activation of one. All other neurons are given an 

value are clipped to the maximum value, and parameter activation of zero. Examples of parameters that may be 

values less than the minimum specified value are set to the encoded using one-in-N encoding include marital status 

minimum value. (three neurons — married, single, and divorced), gender (two 

FIG. 21 shows an exemplary encode input parameter neurons — male and female), and income (multiple neurons 

dialogue box 436 that may be accessed when encode button 45 corresponding to income ranges). 

434 in input parameters section 426 of specify parameters Returning to FIG. 20, specify parameters window 422 

window 422 is selected. Dialogue box 436 may be used to also preferably includes output parameters section 450, 

encode the input parameters and specify the minimum and which identifies the parameters to be predicted based on the 

maximuminputparameter values and the number of neurons data set. Similar to input parameters section 426, output 

on which the input parameter is to be input. 50 parameters section 450 includes parameter name section 

As previously stated, one embodiment of the present 452, include button 454, remove button 456, and encode 

invention supports three types of input encoding schemes button 458, The output parameters for a neural network may 

including spread, clock spread, and one-in-N encoding. be selected and encoded as previously described for the 

Accordingly, dialogue box 436 preferably includes encoding input parameters as described above in discussions relating 

type section 438 for selecting the appropriate encoding 55 to input parameters section 426. 

scheme for a parameter. Dialogue box 436 also includes By selecting encode button 458 in output parameters 

minimum value input 440, maximum value input 442, and section 450, a dialogue box similar to encode input param- 

number of neurons input 444 for specifying these values for eter dialogue box 436 in FIG. 21 is provided. Output 

a parameter. Once the information in dialogue box 436 is parameters, however, are preferably encoded in one of two 

complete, the selections in dialogue box 436 are accepted by 60 schemes: spread and one-in-N encoding schemes, 

selecting OK button 446. Changes to an input parameter via Additionally, a single output parameter is preferably speci- 

dialogue box 436 may be terminated at any time while in fied in output parameter section 450 of parameter name 

dialogue box 436 by selecting cancel button 448. section 452. 

With spread encoding, the parameter of interest is spread In the outputs of a neural prediction network the dot 

across the neurons. If the number of encoding neurons in 65 product between the vector of activation values and the 

number of neurons input field 444 is set to one. the mmimum vector of spot values divided by the maximtmi activation 

input value is associated with an activation of zero, and the value equals the output parameter value, i.e., the predicted 
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value from the network. For spread output encoding with 
only tv/c neurons, the Ouipui values are taken to be the dot 
product of the activations with the spot values, divided by 
the sum of the activations. Where there are more than two 
output neurons, the sum of the activations on adjacent pairs 
of neurons should be determined. The sums of these neurons 
are examined and the highest one is selected. The activations 
and spot values for this pair of neurons may be used in 
decoding a pair of neurons. For one-in-N output encoding, 
the neuron having the highest activation is noted, and the 
output from the network is then the class that corresponds to 
the highest activated neuron. 

Continuing with FIG. 20, once the input parameters and 
output parameters are selected, specify parameters window 
422 may be closed via OK button 460. Alternatively, a 
parameter selection may be canceled at any time with cancel 
button 462. 

Returning to neural network file specification window 378 
shown in FIG. 18, vwndow 378 also preferably includes 
generation status messages region 464. Region 464 provides 
processing status information once a neural run is initiated 
via run button 390. Typically, messages are generated for 
every 1,000 records processed, followed by a finish process- 
ing message at record N message. Also, once the neural 
processing is complete, generation status messages 464 will 
provide other information, such as the name of the output 
file, the encoding data file, the header data file, and that the 
configuration has been saved in the output file. A neural run 
may be suspended at any time via stop button 392. 

Once a predictive network configuration has been speci- 
fied as described above, neural prediction function 38 may 
be run on a data set. With neural prediction function 38 of 
system 10, one or more networks that have been previously 
specified may be run on a data set. In one embodiment of the 
present invention, up to thirty different runs, in which the 
network is run in either training(leaming) mode or predict 
(recall) mode is possible. 

FIG. 22 illustrates an exemplary run neural network 
window 466 for running a network configuration on a data 
set. Run neural network window 466 preferably includes 
toolbar 468 having initiate new input configuration button 
470, open new data configuration button 472, save data input 
configuration button 474, and save data input configuration 
as button 476. Run button 478 in toolbar 468 initiates a run 
with a neural network. View analysis graph button 480 may 
be selected to display a graph that gives an indication of the 
status on a neural network's training. Exit button 482 in 
toolbar 468 may be selected to close neural prediction 
function 38. 

Run neural network window 466 includes data and con- 
figuration section 484 that displays a list of currently speci- 
fied network runs. In the example shown in FIG. 22, two 
batch nms have been specified. In the first run (Rvm 01), 
parameter data is retrieved from a training file, using records 
from one up to 18,000 with ten iterative training cycles 
through this data. In the second run (Run 02), neural 
prediction function 38 is set for predictive, recall mode using 
records 18,001 through 20,000. 

Run neural network window 466 of FIG. 22 also includes 
weights and results files section 486. Section 486 displays 
the selections made for these attributes. In the example 
shown in FIG. 22, Run 01 is using a random weights file for 
input purposes, but is storing the trained weights in file 
FEG.BWT; the results themselves are not being stored to a 
file. In Run 02 weights are input via the BATCH faciUty, but 
trained weights are not being stored. The corresponding 
results in Run 02 stored in file FEG.BRS. 
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Network configurations section 488 is also prQ\nded 
within run neural network window 466. Section 488 dis- 
plays the options selected and defined for training or 
prediction, respectively. Network configurations 488 is 
given a unique identifier, e.g., predict or train, together with 
a textual description. Adding or editing these definitions is 
possible by selecting add button 490 or edit button 492, 
respectively, which provide an appropriate edit network 
configuration window. 

FIG. 23 illustrates an exemplary edit network configura- 
tion window 494 that in one embodiment of the present 
invention may be used to edit a network configuration. 
Window 494 preferably includes network configuration 
identification (ID) field 496 that shows the name of a 
particular network configuration ID. Description field 498 in 
window 494 provides a description of the networic configu- 
ration ID in field 496. 

Edit network configuration window 494 of FIG. 23 also 
preferably includes network parameters section 500. Section 
500 includes number of middle neurons field 502, learning 
rate field 504, and momentum field 506. The number of 
middle neurons is typically set to a value equal to approxi- 
mately 25% of the number of input neurons. The learning 
rate parameter typically ranges between 0 and 1 and deter- 
mines the speed of the convergence of the training process. 
A typical learning rate is 0.3 as shown in FIG. 23, although 
some experimentation may be required. Small learning rates 
can lead to excessive training times, while high learning 
rates can result in a network failing to converge. The 
momentum parameter allows the network to avoid distor- 
tions in the training process and enables the network to 
evade local minima. TTie momentum parameter can also be 
thought of as a smoothing factor applied to the error rate/ 
correction process. Care should be taken, however, to avoid 
overshooting the global minimum error, i.e., the optimum 
solution. A typical value for the momentum parameter is 0.2. 

Mode section 507 in edit network configuration window 
494 must be used to specify the mode of the run in either 
train or forecast mode. Also, user information section 508 is 
used to set whether a display and the display's interval are 
to be provided. When displayed, the frequency of display 
may be set in display interval field 510. 

In one embodiment of neural prediction function 38 of 
data analysis system 10, three training completion criteria 
can be used. For the first criteria the maximxmi number of 
training cycles is specified. One cycle through a data set is 
one complete pass through the training set during training. 
The second type of completion criteria involves setting an 
error goal. When the entire training set reaches the set error 
goal, the training is complete. The third completion criteria 
stops training when the error begins to increase over an 
evaluation data set. This particular method avoids the net- 
work overfitting the training set, i.e., it avoids the problem 
of over-generalization. 

Returning to FIG. 22, batch run set-up section 512 is also 
preferably provided in window 466. Batch run set-up section 
512 includes add button 514, edit button 516, and remove 
button 518. By selecting add button 514 or edit button 516 
a window for editing a run may be provided. 

FIG. 24 illustrates an exemplary edit run window 520 for 
editing a particular run. Window 520 preferably includes 
training data file section 522, forecast data file section 524, 
input weights section 526, output weights section 528, and 
results file section 530. Window 520 and its attendant 
sections allows for browsing and selecting a training data 
file via training data file section 522. Window 520 also 
provides for browsing and selecting a forecast data file via 
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forecast data file section 524. Input weights section 526 
aiiows for selecting either random or batch weights. When 
batch weights are specified, the weights' file corresponding 
to the previous run will be used. Therefore, the batch option 
in section 526 is not available for the first run in a sequence, s 
Output weights section 528 provides for browsing and 
selecting an output weights file. Results file section 530 also 
provides for browsing and selecting a results file. 

Edit run window 520 can also be used to specify the start 
and end record for a run in section 532. Section 534 in lO 
window 520 may be used to specify the start and end 
forecast record for a run. For example, if a file contained 
200,000 records, training on 1 90,000 records with testing on 
10,000 records could be specified with fields 532 and 534. 
Also the number of training cycles may be specified in 15 
training cycle field 536. The neural network's configuration 
may be set in network configuration section 538 to give it a 
textual identifier, e.g., train, test, or predict. Once the fields 
in edit run window 520 are appropriately modified, window 
520 may be closed via done button 540. Cancel button 542 20 
may be used at any time to cancel inputs to window 520. 

Returning to FIG. 22, run neural network window 466 
also includes output options section 544. In one embodiment 
of neural prediction function 38 of data analysis system 10 
of the present invention, three output options as shown in 25 
output options field 544 in FIG. 22 are preferably provided. 
These output options include a text option, a graphic option, 
and an analysis graphs option. 

FIG. 25 illustrates an exemplary text neural network 
results window 545 corresponding to the text option in 30 
output options 544 in window 466 of FIG. 22. Window 545 
preferably includes status information section 546 that 
includes current run field 548 providing status information 
on the current run. Number of training cycles in run field 550 
specifies the number of training runs for a particular run, and 35 
current cycle field 552 specifies the cycle of the run at its 
current state. 

Neural network results window 545 also preferably 
includes go button 554 for initiating a particular session. 
Done button 556 indicates once a particular run is complete. 40 
By selecting about button 558 information on the current run 
may be viewed, and by selecting pause button 560 the 
current run will be suspended. Results information section 
562 in window 545 presents textual information on the 
results of the current neural prediction. 45 

FIG. 26 illustrates an exemplary graphic neural network 
results window 564 corresponding to the graphic option in 
output options 544 in window 466 of FIG, 22. Window 564 
preferably includes status information section 546 as previ- 
ously described. Window 564 also includes graph section 50 
566 providing a graphical representation of the status of a 
particular neural prediction run. The example in FIG. 26 
shows a graphical representation of the actual value of the 
variable being predicted, compared with the predicted value, 
for 48 examples in a training set. As training proceeds, the 55 
predicted value and the actual value should converge. 

When the analysis graphs option is selected in output 
options 544 in window 466 of FIG. 22, neural prediction 
function 38 of data analysis system 10 records the errors 
generated throughout the training run and saves them to a 60 
temporary file. Once the training is concluded, a graph of 
these errors may be viewed by selecting graph button 480 in 
toolbar 468 as shown in FIG. 22. This produces an appro- 
priate dialogue box for selecting the criteria for plotting the 
errors, 65 

FIG. 27 illustrates an exemplary graph definition dialogue 
box 570 that may be produced by neural prediction function 
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38 when the analysis graphs ontion in output options 544 in 
window 466 of FIG, 22 is selected. Dialogue box 570 
preferably includes plot type section 572 for specifying 
whether the actual error or absolute error should be used in 
graphing the errors and also whether icons are to be used in 
plotting the analysis graphs. Also provided within dialogue 
box 570 is batch information section 574 that lists the 
number of batches currently in consideration. Once the 
information within dialogue box 570 is acceptable, OK 
button 576 may be selected. Alternatively, this operation 
may be canceled by selecting cancel button 578. 

FIG. 28 illustrates an exemplary post network graphical 
analysis window 568 having toolbar 569. Toolbar 569 
preferably includes a number of buttons for accessing graph- 
ing options and exit button 598. Wndow 568 shows exem- 
plary absolute error plot 580 generated in accordance with 
neural prediction function 38 of data analysis system 10. The 
example of FIG. 28 displays a plot of absolute error over 19 
training intervals (cycles), and from FIG. 28 it can be seen 
that the error has reduced progressively from an average of 
above 0.738 down to 0.08 over this training interval. 

Data management fiinction 42 of data analysis system 10 
of the present invention may be used to perform several 
operations on data produced by data processor 32. Typical 
operations available with data management function 42 
include data merge, data append, and data pairing opera- 
tions. 

FIG. 29 illustrates data merge function 582 in accordance 
with data management function 42 that allows for combin- 
ing two different files into a single file. Data merge function 
582 allows file 584 of length n records and containing x 
parameters to be merged with file 586, also of length n 
records but with y parameters, into new file 588 with n 
records containing x+y parameters. To facilitate the merge 
operation, input files 584 and 586 preferably do not share 
any parameter names in common with each other. 

FIG. 30 illustrates an exemplary data merge window 590 
for performing data merge function 582 depicted in FIG. 29. 
Data merge window 590 preferably includes toolbar 592 
having mn button 594, stop button 596, and exit button 598. 
Data merge window 590 also preferably includes input files 
section 600 for specifying the files to be merged in first file 
field 600^ and second file field 6006. Each file field has 
browse button 602 for providing a file list for selecting files 
for merging and parameter information 603 for providing 
information on the parameters in each file. 

Data merge window 590 also preferably includes output 
file section 604 for specifying the file destination and name 
for the merge of the input files specified in input files section 
600. Output file section 604 similarly includes browse 
button 606 for selecting the destination and name for the 
merged data files. Data merge window 590 also preferably 
includes conversion status region 608 that provides a textual 
description of the status of a data merge. 

FIG. 31 illustrates data append operation 610 for com- 
bining two files of the same type into a single file that may 
be part of data management function 42 of data analysis 
system 10. The two files, e.g., files 612 and 614 in FIG. 31, 
contain identical parameter lists in each file. The parameters 
in the input files should also be in the same order, with the 
files also having the same length. Appending file 612 with 
file 614 results in output file 616. 

FIG. 32 illustrates an exemplary data append window 618 
for performing data append operation 610 as shown in FIG. 
31. Data append window 618 preferably includes toolbar 
592 as previously described. Data append window 618 
preferably includes input files section 620 that includes 
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directory button 622. By selecting directory button 622 and 
Selecting a directory in directory field 624, a tile listing in file 
listing section 626 is provided. Using include button 628 and 
remove button 630 the files to be appended may be specified 
and moved to selected files section 632. Once a file is 
selected in file listing section 626, file format section 634 
provides the parameter names within the file in parameter 
name section 636 and the number of parameters in the file 
in number of parameters field 638. 

Window 618 also preferably includes output file section 
640 having browse button 642 from which a listing of file 
names may be selected for the output file, e.g., output file 
616 in FIG. 31. Output file field 644 displays the output file 
name. Data append window 618 also preferably includes 
conversion status section 646 that provides status informa- 
tion on a particular data append operation once run button 
594 is selected. 

Data management function 42 also preferably includes a 
data paring capability that allows for paring down a file to 
contain only those parameters of interest. For example, a 
tagged file from neural clustering function 36 could be pared 
down to contain only the cluster identifications and other 
parameters required for fiirther analysis. Another possible 
use of the data paring capability is for rationing stored data 
because of disk storage limitations. 

FIG. 33 illustrates an exemplary data paring window 648. 
Window 648 preferably includes toolbar 592 previously 
described. Window 648 includes files section 650 having 
input file field 652 and output file field 654 for specifying the 
name of the input file to be pared and the name the pared file 
is to be stored under. Both input 652 and output 654 file 
fields have a browse capability that may be activated using 
browse buttons 656. Window 648 also preferably includes 
parameter selection region 658 including not selected 
parameter list 660 and selected parameter list 662. Using 
include button 664, include all button 666, remove button 
668, and remove all button 670, the parameters may be 
moved between fists 660 and 662 to specify a desired data 
paring. 

Data paring window 648 also preferably includes conver- 
sion status section 672 that provides a textual description of 
the status of a particular paring operation once run button 
594 is selected, 

FIG. 34 shows an exemplary data output window 672 that 
provides access to a data output function within data acqui- 
sition and output function 40 of data analysis 10 of the 
present invention. The data output function of data analysis 
system 10 allows for converting files processed by system 
10 back to a format that can be used by other programs. In 
the embodiment of system 10 where data is processed as a 
binary file, the data output function of data acquisition and 
output function 40 may convert a binary ie into, for 
example, an ASCII text format file. 

Window 672 preferably includes toolbar 592 previously 
described. Window 672 preferably includes information 
section 674 having input file field 676, header file field 678, 
and output file field 680. A listing of file names for each of 
these fields may be accessed by selecting one of browse 
buttons 682. Output delimiter field 684 specifies the type of 
delimiter to be used in the output file. In the example shown 
in FIG. 34, space, lab, and comma delimiters are available 
deliminators for the output file. 

Window 672 also preferably includes file information 
section 686 providing additional information on the input 
file including the number of records, the number of 
parameters, and the names of the parameters within the input 
file. Select all button 688 within file information section 686 
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may be selected to include all of the records and their 
parameters in the input file for conversion to the output file. 

Data output window 672 also preferably includes record 
selection region 690 that allows for specifying the start 
S record and finish record for the input file for conversion to 
the output file. Also, window 672 preferably includes con- 
version status region 692 that provides a textual description 
of the status of a data output operation once run button 594 
is selected. 

Although the present invention has been described in 
detail, it should be understood that various changes, 
substitutions, and alterations can be made hereto without 
departing from the spirit and scope of the invention as 
defined by the appended claims. 

What is claimed is: 

1. A system for analyzing a data file containing a plurality 
of data records, each data record containing a plurality of 
parameters, the system comprising: 

an input for receiving the data file; and 
a data processor comprising a clustering function for 
20 clustering the data records into a plurality of clusters 
containing data records having similar parameters 
wherein the clustering function is further operable to 
generate a cluster map including a graphical depiction 
of the clusters, wherein the cluster map comprises a 
25 plurality of graphical elements each having a graphical 
depiction indicative of a number of records in a cluster, 

2. The system of claim 1 wherein the input is further 
operable to convert the data records into a processing format 
for the data processor. 

3Q 3. The system of claim 2 further comprising an output 
operable to convert the data records in the processing format 
back to their original format. 

4. The system of claim 2 wherein the data records in the 
data file are in ASCII format and are processed in binary 
format in the data processor. 

5. The system of claim 1 further comprising a data 
manager for manipulating the data file. 

6. The system of claim 5 wherein the data manager further 
comprises a data append function for appending data files. 

7. The system of claim 5 wherein the data manager further 
40 comprises a data merge function for merging data files. 

8. The system of claim 5 wherein the data manager further 
comprises a data paring function for paring parameters from 
a data file. 

9. The system of claim 1 wherein the cluster map is color 
45 coded to depict the relative number of records in each 

cluster. 

10. The system of claim 1 wherein the clustering function 
is further operable to provide statistics for each parameter 
for the records in a cluster. 

50 U. The system of claim 1 wherein the clustering function 
is further operable to provide a parameter graph for each 
parameter in the records in a cluster. 

12. The system of claim 1 wherein the clustering function 
further comprises a neural clustering function. 

55 13. The system of claim 1 wherein the data processor 
further comprises a prediction function for predicting 
expected future results from the parameters in the data 
records. 

14. The system of claim 13 wherein the prediction func- 
60 tion further comprises a neural prediction function. 

15. The system of claim 1 wherein the data processor 
further comprises a segmentation function for segmenting 
the data records into a plurality of segments based on the 
parameters. 

65 16, The system of claim 15 wherein the segmentation 
function is further operable to provide statistics on the data 
records. 
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17. The system of claim 15 wherein the segmentation 34, The system of claim 32 wherein the data mrmager 
function is further operable to segment the data records into further comprises a data merge function for merging data 
a plurality of segments using segmentation logic. files. 

18. The system of claim 15 wherein the segmentation 35. The system of claim 32 wherein the data manager 
function is further operable to segment an existing segment 5 further comprises a data paring function for paring param- 
into additional segments. eters from a data file. 

19. The system of claim 1 wherein the clustering function 36. The system of claim 31 wherein the segmentation 
is further operable to identify characteristic profiles for each function is further operable to provide statistics on the data 
group. records. 

20. The system of claim 13 wherein the prediction func- 37. The system of claim 31 wherein the segmentation 
tion employs a multi-layer perception network in predicting function is further operable to segment the data records into 
the expected future results. a plurality of segments using segmentation logic. 

21. The system of claim 1 wherein the data records further 38. The system of claim 31 wherein the segmentation 
comprise customer data records containing a plurality of function is further operable to segment an existing segment 
customer parameters in each customer record. into additional segments. 

22. The system of claim 15 wherein the data records 39. The system of claim 31 wherein the clustering func- 
further comprise customer data records containing a plural- tion is further operable to identify characteristic profiles for 
ity of customer parameters in each customer record and each customer group. 

wherein the segmentation function is further operable to 40. The system of claim 31 wherein the prediction func- 

scgmcnt the customer data records into logical groups of tion is further operable to predict prospective customer 

customers. 20 behavior from current customer data records. 

23. The system of claim 21 wherein the clustering func- 41. The system of claim 31 wherein the segmentation 
tion is further operable to cluster customer data records into function is further operable to identify statistical distribu- 
statistically significant groups of customers. tions for each segment. 

24. The system of claim 13 wherein the data records 42. The system of claim 31 wherein the segmentation 
further comprise customer data records containing a plural- 25 function is further operable to generate a histogram for each 
ity of customer parameters in each customer record and parameter in the data records. 

wherein the prediction function is further operable to predict 43. The system of claim 31 wherein the segmentation 

customer behavior from the customer data records. function is further operable to generate a histogram for a 

25. The system of claim 13 wherein the data records segment. 

further comprise customer data records containing a plural- 30 44. The system of claim 31 wherein the clustering func- 

ity of customer parameters in each customer record and tion is operable to generate a histogram for each cluster, 

wherein the prediction function is further operable to predict 45. The system of claim 31 wherein the clustering func- 

c\istomer behavior from current customer data records. tion is further operable to generate a cluster map depicting 

26. The system of claim 15 wherein the segmentation the number of records in each cluster, 

function is fiirther operable to identify statistics for each 35 46. The system of claim 45 wherein the cluster map is 

segment. color coded to depict the relative number of records in each 

27. The system of claim 15 wherein the segmentation cluster. 

function is fiirther operable to identify statistical distribu- 47. The system of claim 31 wherein the clustering func- 
tions for each segment. tion is further operable to provide statistics for each param- 

28. The system of claim 15 wherein the segmentation 40 eter for the records in a cluster. 

function is further operable to generate a histogram for each 48. The system of claim 31 wherein the clustering func- 

parameter in the data records. tion is further operable to provide a parameter graphs for 

29. The system of claim 15 wherein the segmentation each parameter in the records in a cluster. 

function is fiirther operable to generate a histogram for a 49. A method for analyzing a data file containing a 

data segment. 45 plurality of data records, each data record containing a 

30. The system of claim 1 wherein the clustering function plurality of parameters, the method comprises the steps of: 
is further operable to generate a histogram for each cluster. inputting the data file; and 

31. A system for analyzing a data file containing a processing the data file by 

plurality of customer data records, each data record contain- segmenting the data records into a plurality of segments 

ing a plurality of customer parameters, the system compris- 50 based on the parameters, 

ing: clustering the data records into a plurality of clusters 

. an input for receiving the data file; and containing data records having similar parameters, 

a data processor for processing the data records, the data and 

processor further comprising predicting expected future results from the parameters 

a segmentation function for segmenting the customer 55 in the data records, 

data records into a plurality of segments based on the 50. The method of claim 49 wherein the inputting step 

parameters, further comprises converting the data records into a prede- 

a clustering function for clustering the customer data termined processing format, 

records into a plurality of customer groups having 51. The method of claim 50 further comprising the step of 

similar parameters, and 60 converting the data records in the processing format back to 

a prediction function for predicting ctistomer behavior their original format, 

from the customer data records. 52. The method of claim 49 further comprising the step of 

32. The system of claim 31 further comprising a data appending data files together. 

manager for manipulating the data file. 53. The method of claim 49 further comprising the step of 

33. The system of claim 32 wherein the data manager 65 merging data files together. 

further comprises a data append function for appending data 54. The method of claim 49 further comprising the step of 

files. paring parameters from a data file. 
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55. The method of claim 49 wherein the segmenting step 
further comprises providing statistics on the data records, 

56. The method of claim 49 wherein the segmenting step 
further comprises segmenting an existing segment into addi- 
tional segments. 5 

57. The method of claim 49 wherein the clustering step 
further comprises clustering the data records into groups 
having similar parameters. 

58. The method of claim 57 wherein the clustering step 
further comprises identifying characteristic profiles for each lo 
group. 

59. The method of claim 49 wherein the data records 
further comprise customer data records containing a plural- 
ity of customer parameters in each customer record. 

60. The method of claim 59 wherein the segmenting step 15 
further comprises segmenting the customer data records into 
groups of customers. 
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61 . The method of clairD 59 wherein the clustering step 
further comprises clustering customer data records into 
statistically significant groups of customers, 

62. The method of claim 59 wherein the predicting step 
further comprises predicting customer behavior from the 
customer data records. 

63. The method of claim 59 wherein the predicting step 
further comprises predicting prospective customer behavior 
from the customer data records. 

64. The method of claim 49 wherein the segmenting step 
further comprises generating a histogram for each parameter 
in the data records. 

65. The method of claim 49 wherein the segmenting step 
further comprises generating a histogram for a data segment. 

66. The method of claim 49 wherein the clustering step 
further comprises generating a histogram for each cluster. 
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